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© Adaptive data compression method and apparatus. 



© A method and apparatus for compressing data, particularly useful in a communication system which 
employs modems (10, 15). The preferred method is implemented in a microprocessor (30) within a modem (10, 
15), and dynamically adapts to changing data statistics. Parallel encoding (T1) and decoding (T2) tables are 
provided in a memory (35) at an encoding modem (10) and a decoding modem (15), and are updated for each 
character processed. Each table (T1, T2) has a plurality of digital compression codes associated with the 
characters of an alphabet. In response to an item of data presented for encoding, a compression code which 
corresponds to the character presented for encoding, is selected using the encoding table (T1). The selected 
compression code is provided as an output over a communication line (12). Periodically, the association between 
the codes and the characters of the alphabet in the tables (T1 , T2) is adjusted as a function of the frequency of 
occurrence of characters of the alphabet, over a plurality of characters, as maintained by a character count As 
the frequency of occurrence of characters presented for encoding changes, the more frequently occurring 
characters become associated with the shorter compression codes in the encoding table (Tt). By performing the 
same steps in parallel at the decoding table (T2) at the decoding modem (15), the compressiop codes coming in 
over the communication line (12) are used at the decoder to select a character of the alphabet, which is provided 
as a decoded output. To/from computer ^ 13 
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"Adaptive Data Compression Method and Apparatus" 



Technical Field 



The present invention relates generally to data compression, and relates more particularly to an 
adaptive data compression method and apparatus which is operative in real time and dynamically adapts to 
changing statistics of occurrence of characters in an input information stream. 



Background of the Invention 

10 

Various schemes and devices for data compression are known in the art. Many known techniques rely 
upon statistics which relate to the frequency of occurrence of elements or "characters" in a data set. 
Typically, a set or file of data to be compressed must be prescanned or processed in order to accumulate 
the statistics of occurrence of the characters in the data set, preparatory to assigning codes for representing 

75 the characters. Then, shorter codes are assigned to the more frequently occurring characters and longer 
codes are assigned to the less frequently occurring characters. 

In many prior art data compression systems, analysis of the entire data set is required before codes 
can be assigned. While these prior art data compression techniques are suitable when there is ample 
processing time available for prescanning the data, in real time applications such as data communications 

so involving a modulator-demodulator ("modem") it is not always possible or desirable to prescan data and 
establish the most efficient code. In such applications, the nature of the data may change, rendering a fixed 
compression scheme inefficient For example, modems may transmit text files, graphics files, mixed text 
and graphics, software object code, spreadsheet files, interactive communications with other systems, or 
other types of data. In order for compression to be effective, prescanning and/or a separate encoding - 

25 scheme would be required for each different type of data expected. These types of data can change 
unpredictably, even within a given transmission. In many instances, a less-than-optimum code may be quite 
acceptable to provide reduced overall transmission or data storage times as well as compression. 

Accordingly, there is a need for a data compression technique which is suitable for use in a modem or 
other real time data compression applications, wherein the types of data can change frequently and even 

30 during a transmission. 

In application systems wherein a mixture of text and numeric data is involved (such as a spread sheet 
file), when the data typB shifts to numeric from text compression efficiency will be lost since numeric files 
are more likely to be random than text files. Nonetheless, when the data type shifts to numeric, the 
probability of occurrence of the numeric digits increases dramatically compared to the probability of 

35 occurrence of text characters, so that increased efficiencies could be obtained if the the compression 
algorithm could adapt to the changing nature of the data type. 

Accordingly, some types of data compression methods are most suitable for applications wherein the 
' type of data is predictable, that is, it is predetermined that the file being transmitted is text or numbers (but 
not both), or wherein the data is not random. However, in applications wherein the type of data is not 

40 predictable, such as where there is mixed text and numbers* or of unknown type, this method will tend to 
employ a less-than-optimum code- Accordingly, there is a need to provide an adaptive data compression 
method which is dynamically adaptable to any type of data or distribution of types of data. 



45 Summary of the Invention 

The present invention provides an adaptive data compression method and apparatus which does not 
require that an input file be prescanned and does not rely upon an assumption of nonrandom data. The 
method is adaptive to redundancy in any type of data, not just data which is comprised of characters having 
so given probabilities of other characters following a given character. The method is suitable for use in a 
modem or other applications wherein real time data compression is desirable. Moreover, the resultant code 
has the desirable property of instantaneous decodability. 

Briefly described, the present invention provides a dynamically adaptive data compression method 
which relies on the periodic accumulation of data statistics and reassignment of compression codes to 
particular characters as a function of the changing probabilities of occurrence. As in many other data 
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compression techniques, the present invention works on the principle of data statistics. However, these 
statistics are accumulated "on the fly" in the present invention, eliminating the need to prescan the data to 
determine its characteristics. Data that occurs more frequently is represented by a compression code 
having fewer bits than data that occurs less frequently. 

5 Additionally, instead of representing data in bit patterns of arbitrary length, which is commonly 
employed in optimal code methods such as the Huffman code, the present invention employs a set of 
predetermined bit length compression codes. In the preferred embodiment bit lengths of four, eight, and 
twelve bits are employed so that the method may be conveniently implemented in an octet-based packet 
system or an eight-bit microprocessor. Optimum compression and performance is achieved in the present 

io invention when a greater number of data characters are represented with four bit compression codes. 

The preferred method and apparatus has the ability to adapt to changing data characteristics or types. 
Periodically, the current compression performance is evaluated, utilizing current statistics. In response 
thereto, the compression "level", that is, the numbers of characters of an alphabet of characters which are 
associated with the different compression codes, is changed to optimize performance. 

is The data statistics are kept in a linear array of registers or memory locations called "counts", where 
there is one element for each possible data character in a data set or "alphabet". These counts contain the 
number of occurrences of a given data character that occurred in a given number of processed characters. 
This array is always kept relatively sorted, so the most frequently occurring characters' counts are 
positioned toward the beginning of the array. The order in which the characters appear in the count array 

20 will determine the compression code and accordingly how many four bit nibbles (one, two or three) will be 
used to represent a character in the input data stream. A first predetermined number a of the characters of 
the alphabet are encoded as one nibble. The last 16a characters are encoded as three nibbles. The 
remaining characters are encoded as two nibbles (an eight bit byte), where a is refered to as the 
"compression level." The value of the number a is periodically recomputed as a function of the frequency 

25 of occurrence of characters, as represented by the character counts. 

More paricularly described, the preferred data compression method is performed in a programmed 
microcomputer contained in a modem, and comprises the following sequence of steps; 

(1) An encoding table having a plurality of digital representations of characters of an alphabet is 
provided in a memory associated with the microcomputer. Each of the characters has associated therewith 

30 a count of the frequency of occurrence of the character. The encoding table used to compute one of a 
plurality of compression codes for the characters of the alphabet. Some of these compression codes are 
shorter than others; the shortest codes are shorter than an uncoded digital representation of the characters 
of the alphabet 

(2) As an item of data represented by one of the characters of the alphabet is presented for 
35 encoding, a compression code which corresponds to the presented particular character of the alphabet is 

computed using the encoding table. Additionally, the count associated with the particular character is 
incremented to reflect that this particular character has occurred in the data stream. The computation of the 
compression code is a function of the position of the presented particular character in the encoding table. 

(3) The computed compression code is then provided as a compressed output. This code is 
40 transmitted by the modem over a communications link to another modem. 

(4) Periodically, the association between the compression codes and the characters of the alphabet in 
the encoding table are adjusted. This adjustment is a function of the frequency of occurrence of the 
characters of the alphabet over a predetermined plurality of characters presented for encoding, as 
represented by the character counts in the encoding table. The size of this predetermined plurality is 

45 selected to be a limited number so as to provide for speed of updating. Accordingly, as the frequency of 
occurrence of characters changes over a plurality of characters presented for encoding, the more frequently 
occurring characters become associated with shorter compression codes in the encoding table. 

As mentioned, the encoding table contains the association of characters and compression codes, and 
the association of characters and counts. Initially, the characters of the alphabet and their corresponding 

so compression codes are arranged in an initial ordered array with a set of four bit compression codes 
positioned toward the beginning of the array. The periodic adjustment of the association between codes and 
the characters of the alphabet is effectuated by the following steps: 

(1) In the encoding table, a separate "character count" is maintained for each character of the 
alphabet. 

55 (2) As an item of data is presented for encoding, and after computation of a code using the encoding 

table, the character count associated with the character presented for encoding is updated by incrementing. 



3 



0 283 735 



(3) After the character count is incremented, the character count just updated is then compared to the 
character count for a preselected character at a position* in the array more likely to be associated with a 
shorter code. In other words, the particular character count is compared to the character count for a 
preselected character positioned closer to the beginning of the array, where the shorter codes are 

5 positioned. 

(4) If the particular character count is greater than the character count of the preselected character * 
closer to the beginning of the array, the relative positions in the array of the character just encoded and the 
preselected character are exchanged. This causes association of the character having the larger character 

count with a position in the array more likely to be associated with a shorter compression code. s 
w (5) If the particular character count is not greater than the character count of the preselected 

character closer to the beginning of the array, then step (3) is repeated, by comparing the character count 
just updated with the character count one step further from the beginning of the array (that is, closer to the 
character count for the character presented for encoding). This process stops if the character presented for 
encoding is reached. 

75 Since the periodic adjustment of the corresponding relationship between the codes and the characters 
of the aplhabet is performed for each item encoded, the more frequently occurring characters to tend to 
migrate toward the beginning of the array and thereby become associated with shorter compression codes. 
In the disclosed embodiment, the preselected character is positioned a predetermined number of positions 
in the array toward the beginning of the array, sixteen positions in the preferred embodiment. Thus, the 

20 comparison and exchange amounts to a partial or limited sort, and is quick and easy to implement on a 
microprocessor since a maximum of sixteen compare operations and a single data exchange are all that is 
required. 

Decoding of the compressed data Js effectuated by a parallel decoding table maintained according to 
the same method. The fac that the encoding and decoding tables are adjusted or updated only on a limited 

25 basis results in the ability to compress and decompress at very high speeds, well within the capabilities of 
present data microcomputer circuits. Advantageously, therefore, the present invention, when implemented in 
a modem, can handle 19,200-bit per second asynchronous data communications because of the speed at 
which the tables are updated. 

Still further described, the preferred method of the present invention includes steps taken to maximize 

30 the number of characters associated with shorter compression codes. Initially, the array is preset with a 
predetermined number of characters of an alphabet associated with four bit codes, sixteen times this 
number of characters associated with twelve bit codes, and the remainder of the alphabet being associated 
with eight bit codes. The encoding level, represented by the predetermined number a, is dynamically 
altered as data statistics are accumulated, to reflect that characters having a high probability of occurrence 

35 should receive a four bit code. Accordingly, steps are taken to periodically adjust the sizes of the sets of 
numbers of characters associated with four, eight, and twelve bit codes. This involves summing groups of 
character counts to obtain "group counts" for the less frequently occurring characters, comparing the group 
counts to the character counts for more frequently occurring characters, and successively increasing the 
number of items in the set of four bit codes (i.e., the shortest codes) until there is a correspondence 

40 between a group count and a character count. In this manner, the size of the set characters associated with 
the shortest digital codes is increased to include more characters having a higher probability of occurrence, 
resulting in more optimum use of the shorter character codes. In other words, more of the shorter codes are 
assigned to more frequently occurring characters and more of the longer codes are assigned to the less 
frequently occurring characters. 

45 Yet still further described, the present invention includes a novel and advantageous method for 
representing a string of repetitive characters. As will be know to those skilled in the art, repetitive characters 
strings are freqeuntly encountered in certain types of data files transmitted by modem, for example 
spreadsheet files, the method entails the entry by both the encoder and the decoder into a "repeat state" * 
upon the occurrence of a character string exceeding a predetermined number of characters, say for 

so example, three. The repeat state is automatically entered following the transmission or recording of the 

three repetitive characters. After this repeat state is entered, a "repeat indicia" indicative of the number of « 
additional occurrences of the repetitive character (after the predetermined number) is transmitted or 
recorded. This repeat indicia can represent a number of additional occurrences up to a second predeter- 
mined number, say fifteen, or it can represent the second predetermined number, which is also indicative 

55 that the repeat state is to continue and that a subsequent repeat indicia is expected. Advantageously, data 
which frequently includes strings of repetitive characters exceeding the first predetermined number will 
realize significant compression. 

Accordingly, it is an object of the present invention to provide a data compression method and 
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apparatus which is dynamicaily adaptable to changing data statistics on a real time basis. 

It is another object of the present invention to provide an improved data compression method and 
apparatus which does not require prescanning of data in order to select an optimum code. 

It is another object of the present invention to provide a data compression method wherein the codes 
5 employed to represent information are instantaneously decodable. 

It is another object of the present invention to provide an improved data compression method and 
apparatus which is suitable for use in data communications applications such as a modem. 

It is another object of the present invention to provide an improved data compression apparatus and 
method which dynamically adapts to changing data characteristics. 
10 It is another object of the present invention to provide an improved data compression method and 
apparatus which can dynamicaily adapt to compression of text files, graphics files, mixed text and graphics, 
software object code, spreadsheet files, interactive communications with other systems, or other types of 
data, where such data types can change rapidly and even during a transmission. 

It is another object of the present invention to provide an adaptive data compression method which is 
15 dynamically adaptable to any type of data or distribution of types of data. 

It is another object of the present invention to provide an improved data compression method and 
apparatus which is also suitable for us in data recording or storage applications such as a disk drive. 

It is another object of the present invention to provide an improved data compression method and 
apparatus which may be efficiently implemented with an eight-bit microprocessor or a packet-oriented 
20 system, and can handle 19,200-bit per second asynchronous data communications. 

It is another object of the present invention to provide an improved data compression method and 
apparatus which is able to compress streaming data without adding appreciable delay to the transmission 
and/or storage of the data. 

It is another object of the present invention to provide an improved data compression method and 
25 apparatus wherein the compression performance is periodically evaluated using statistics accumulated over 
a predetermined interval or number of characters, and the level or degree oc compression changed in 
response thereto to optimize performance. 

It is another object of the present invention to provide an improved data compression method and 
apparatus having an improved repeat character state wherein repetitive consecutive occurrences of a 
so character are represented in a compressed manner. 

It is a specific object of the present invention to provide a data compression method and apparatus for 
use in a modem which is able to reduce transmitted data to about five bits on the average for each 
character of typical ASCII text files. 

It is another specific object of the present invention to provide a data compression method which is 
35 simple enough that it can be implemented on a eight-bit microcomputer and still allow the microprocessor 
to handle a two-way 19.200 bit per second synchronous channel and still be able to handle X.25 protocol 
functions. 

These and other objects, features and advantages of the present invention will become more apparent 
and be more clearly understood and appreciated from a review of the following detailed description of the 
40 disclosed embodiments and by reference to the appended drawings and claims. 



Brief Description of the Drawings 

45 Fig. 1 schematically illustrates a data communications application employing the preferred data 

compression method and apparatus of the present invention. 

Fig. 2 is a block schematic diagram of a microcomputer based circuit for implementing the preferred 
method of data compression in a modem 

Figs. 3-16 illustrate an exemplary data stream which is compressed according to the preferred 
so method disclosed herein and the resultant encoding tables at various points within the compression 
operation. 

Fig. 17, consisting of figs. 17a-17c, illustrates a method for compressing strings of repetitive 
characters. 

Appendices l-VI are tables of hexadecimal representations of character positions of an alphabet 
55 employed for representing data and the hexadecimal compression codes used to represent encoded 
compressed data. 
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Detailed Description of the Preferred Embodiment 

Referring now to the drawings, in which like numerals illustate like elements throughout the several 
figures, the preferred method for data compression is suitable for use in a modulator demodulator 

5 ("modem") employed in data communications applications, such as the modem 10 which transmits data via 
a communications link or telephone line 12 to a second modem 15. Although the disclosed preferred 
embodiment is for use in a modem, it should be understood that the present invention is suitable for use in 
any type of data compression application, for example mass data storage, database compression, telem- 
etry, video/picture compression, speech compression, seismic data compression, and the like. 

w It should be understood from the outset that the preferred embodiments of the present invention rely on 
the use of an encoding device for compressing datas and a separate but parallel decoding device for 
decoding compressed data. In accordance therewith, the modem 10 constructed in accordance with the 
preferred embodiment has associated therewith an encoding table T1, constructed in the manner which will 
be described below. Similarly, the modem 15 has associated therewith a decoding table T2. which is 

is maintained in parallel with the encoding table T1 , also as will be described below. 

Next, a few definitions are in order. It should first be understood that as the term is used herein, the 
word "alphabet" means a set of characters which are used to represent information. For example, the ASCII 
data set 45 in Figs. 3 and 4, which consists of 256 characters, is considered an "alphabet". It will also be 
appreciated that the EBCDIC standard may also be considered an "alphabet". 

20 The word "character" as used herein is not limited to a letter or numeral or digital code, but means of 
any type of informational element capable of being encoded in binary code. Most but not all. characters will 
be encoded according to a predetermined format. It will therefore be appreciated that in the ASCII or 
EBCDIC standards, a comman, a space, a numeral, and a letter are all considered "characters". Similarly, a 
numerical digit is considered a character, as is a purely binary coded representation of data not otherwise 

25 associated with any format. 

The encoding table T1 and the decoding table T2 in the preferred embodiment are both implemented 
as a data array contained in a memory 35 associated with a microcomputer 30, such as is illustrated in Fig. 
2. Typically, a modem 10 which incorporates means for effectuating the data compression method 
disclosed herein will include a data compression circuit 20. The data compression circuit 20 is connected to 
a modem computer interface circuit 22. The modem computer interface circuitry 22 is a conventional 
modem circuit and comprises circuitry for receiving a character of data over line 23 from an input source 
such as a computer, and preparing that item of data for transmission. Typically, the characters of data are 
received as an eight bit byte. This eight bit byte is provided on a data bus 25 which is connected to the 
data compression circuit 20. 

The results of data compression are provided on the data bus 25 to modem telephone interface circuit 
24, which receives data from the compression circuit 20 and conditions the data for serial transmission over 
a communications link such as the telephone company line 12. In the preferred embodiment, data provided 
from the compression circuit 20 to the modem telephone interface circuit 24 is encoded in the manner 
described hereinbelow to represent the character which was provided to the compression circuit by the 
40 modem computer interface circuit 22. 

The preferred embodiment of the data compression circuit 20 comprises a Z80 microcomputer 30 
which is responsive for receiving a byte of data from the modem computer interface circuit 22, compressing 
it, and then providing the encoded result back over the data bus 25. except directed to the modem 
telephone interface circuit 24 for serial transmission. The Z80 microcomputer circuit, manufactured by Zilog 
45 Corp., is an eight-bit microcomputer and is well known to those skilled in the art. Accordingly, further details 
of this circuit and its operation will not be provided herein, and are available in the literature supplied by the 
manufacturer. 

Additional components in the preferred embodiment operative together with the micromputer 30 include 
a program memory 31 which contains the program for the microcomputer 30. This program memory, 

so preferably a programmable read-only memory ("PROM") is connected to the data bus 25 so that program 
instructions can be transmitted to the microcomputer 30 under program control. The address bus 32 of the 
microcomputer 30, consisting of lines A0-A15, is connected between the program memory 31 and the 
microcomputer 30 so that address signals from the microcomputer can address the program memory in the 
known manner. A programmable timer/counter circuit 33 is connected to the microcomputer 30 via data bus 

55 25 for implementation of timing routines. In the preferred embodiment the circuit 33 is a Zilog Z80-CTC, 
which has four independent timer circuits, and generates unique interrupt vectors to facilitate interrupt 
driven timing routines. 

The encoding table T1 (or decoding table T2, in the case where decompression is to be performed) is 
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contained in a random access memory ("RAM") 35. which is connected to receive address signals on the 
address bus 32 and to receive and transmit data signal over data bus 25. In one embodiment, the random 
access memory 35 is a 1024 x 8 bit memory, arranged as 512 x 16. This particular configuration readily 
allows for a 256 character alphabet: eight bits are provided for each character of the alphabet, eight bits are 

5 provided for a character count for each character of the alphabet, and twelve bits (three four-bit nibbles) are 
computed for the compression code for each character. This arrangement therefore employs 256 x 28 = 
7168 bits of 8192 provided. It will of course be understood that the size of this memory may be smaller or 
larger depending upon the size of the alphabet, the number of bits used to encode the alphabet, and the 
number of bits used to represent the character count. 

w In the preferred embodiment, the compression code of twelve bits is computed in real time by 
microcomputer 30 rather than stored. This configuration allows random access memory 35 to be 256 x 16 
= 4096 bits, for storage of eight bits for each character of 256 character alphabet and eight bits for a 
character count for each character of the alphabet. 

Control signals associated with the microcomputer 30 are not illustrated in Fig. 2, inasmuch as such 

is elements will be known to those skilled in the art. However, it should be understood that since the preferred 
embodiment of the present invention operates to compress each character of an input data stream, each 
character to be encoded is provided by the modem computer interface circuit 22 for compression, one 
character at a time. The presence of a character for compression is signalled by an interrupt (INT) on line 
36 which is provided from the modem computer interface circuit 22. In a similar manner, the modem 

20 telephone interface circuit 24 provides the INT signal when the system is employed in the decoding mode; 
the respective INT signals from the modem circuit 22, 24 are wire-ORed so that either may generate an 
interrupt. The interrupt signal informs the microcomputer 30 of the presence of data for encoding or 
decoding, as the case may be, on the data bus 25. 

Also included within the* modem interlace circuits 22 and 24 are control registers 27, 28. respectively, 

25 which are employed to assist in the interface between the modem interface circuits and the data 
compression circuit 20. The control registers 27, 28 serve the function of providing various control signals 
needed to interface with the microcomputer 30. For example, when the microcomputer 30 is conditioned to 
be in an encoding state, the microcomputer must acknowledge the receipt of a character from the circuit 22 
for encoding. Accordingly, the microcomputer 30 responds in a handshaking fashion to an interrupt from the 

jo modem computer interface circuit 22 by setting a bit in the control register 27 to acknowledge the interrupt 
and signal that the microcomputer is "busy". When the microcomputer is able to accept another character 
for encoding, this "busy" bit is cleared, and the modem computer interface circuit is then conditioned to 
provide another character for compression and ultimate transmission. 

In a similar fashion, control register 28 facilitates the interface between the microcomputer 30 and the 

35 modem telephone interface circuit 24. For example, when a coded compressed representation of a 
character is ready for transmission, the microcomputer 30 sets a bit in the control register 28 of the modem 
telephone interface circuit to signal this circuit that a code is ready for transmission, and has been placed 
on the data bus 25 for strobing into an appropriate storage register (not illustrated) in the circuit 24. 

It will also be understood that when the system is configured in the decoding mode, the data paths are 

40 reversed from that just described. Data will be received from the telephone line 1.2, provided through the 
modem telephone interface circuit 24 to the microcomputer 30 for decompression, and then provided to the 
modem computer interface circuit 22 for provision over fine 23 to the computer. Accordingly, data received 
over the telephone line 12 is provided through the modem circuit 24 and converted into an eight bit byte for 
provision over the lines 25 of the data bus to the compression circuit 20, which is placed in a "decoding" or 

45 decompression state. The control register 28 serves a handshaking function similar to that performed by the 
control register 27 as the data is communicated between the modem telephone interface circuitry 24 and 
the compression/decompression circuit 20. 

Next will be a discussion of the theory and operation of the preferred method and apparatus for data 
compression. Generally speaking, the preferred data compression method comprises a sequence of steps 

so which are implemented as a program for the microcomputer 30. These steps may be described as follows: 
(1 ) An encoding table T1 having a plurality of entries associated with characters of an alphabet 45 is 
provided in the RAM 35. Also provided in the encoding table is a count (cc) associated with each character 
of the alphabet. The alphabet in the preferred embodiment has n~ 256 characters, and comprises the 
ASCII data set; the encoding trable T1 is provided in a compressing or transmitting modem 10. The 

55 encoding table is. used to compute a compression code 60 (see Appendix III), in the manner which will be 
described below. The compression codes 60 have a predetermined number z of bits, but there are a 
plurality of sets 50 (Appendix lll)of compression codes provided in the preferred embodiment. The number 
z is the same within each set of compression codes, but varies between sets 50 of compression codes. 
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Some of these compression codes 60a are shorter than others 60b; the shortest codes are shorter than an 
uncoded representation of the characters of the alphabet. In the disclosed embodiment, there are three sets 
of compression codes: a set of four bit codes 50a. a set of eight bit codes 506, and a set of twelve bit 
codes 50c. Thus, z is four, eight, or twelve. 

5 (2) Items of data 70 (Fig. 3) for encoding are presented in a sequential data stream. Each separate 

item 70a. 706.... may be generally referred to as an item x. As an item x of data represented by one of the * 
characters of the alphabet is presented for encoding, a compression code 60 which corresponds to the 
presented particular character of the alphabet is computed based on the order number of the item x in the 
encoding table and the compression level a. ^ 

io (3) The computed compression code is then provided as a compressed output. This code is 

transmitted by the modem over the telephone line 12 to the receiving modem 15. 

(4) Periodically, the relationship between the compression codes and the characters of the alphabet 
in the encoding table T1 are adjusted. The adjustment only occurs for characters which occur relatively 
frequently when compared to other characters, and involves changing the compression code associated 

;s with a particular character. The adjustment is based upon the frequency of occurrence of the particular 
characters of the alphabet over a predetermined plurality of characters presented for encoding, as 
represented by the character counts in the encoding table. The size of -this predetermined plurality, 
represented by the number k, is selected to be a limited number so as to provide for speed of updating. 
Accordingly, as the frequency of occurrence of characters over k characters presented for encoding 

20 changes, the more frequently occurring characters become associated with shorter compression codes in 
the encoding table. 

As described, the encloding table Tl contains the characters of the alphabet and the counts associated 
with the characters, and may contain the compression codes associated with the characters if the 
compression codes are precomputed. Initially, the characters of the alphabet and their corresponding 
25 compression codes are arranged in an initial ordered array, with a set of a four-bit codes 50a positioned 
toward the beginning of the array. There are b eight bit codes 50b and c twelve bit codes 50c. The periodic 
adjustment of the association between compression codes and the characters of the alphabet is effectuated 
by the following steps: 

(1) In the encoding table Tl, a separate "character count", referred to as CC, is maintained for each 
30 character of the alphabet. 

(2) As an item 70 of data x is presented for encoding, and after computation or selection of a 
compression code using the encoding table, the character count CC associated with the character 
presented for encoding is incremented. 

(3) After the character count is incremented, the character count just updated CC1 (Fig. 5)is then 
35 compared to the character count CC2 for a preselected character at a position in the array more likely to be 

associated with a shorter code. This preselected character in the disclosed embodiment is d places closer 
to the beginning of the array; d is sixteen in the preferred embodiment In other words, the just-updated 
particular character count is compared to the character count for a preselected character positioned closer 
to the beginning of the array, where the shorter codes are positioned. 

40 (4) If the character count of the preselected character closer to the beginning of the array is less than 

the particular character's count, the relative positions in the array of the character just encoded and the 
preselected character are exchanged, as shown in Fig. 4 after the exchange occurs. If the character count 
of the preselected character closer to the beginning of the array is not less than the particular character's 
count, the same comparison is made with each character count moving closer to the position of the count 

45 just updated, until a smaller character count is found and the described positional exchange is made, or 
until all d counts have been compared. This causes association of the character having the larger character 
count with a position in the array more likely to be associated with a shorter code. 

It should be understood at this point that particular compression codes are associated with particular * 
positions in the array, and the exchanging positions of characters entails moving a character and its 

so associated character count to a different position in the array so that the character becomes associated with 

the compression code formerly associated with the character having the lower character count. ^ 

Since the periodic adjustment of the corresponding relationship of the compression codes and the 
characters of the alphabet is done for each item encoded, the more frequently occurring characters tend to 
migrate toward the beginning of the array and thereby become associated with shorter codes. This 

55 migration is relatively gradual, since a character can move no more than d- 16 positions at one time. Thus, 
the comparison and exchange amounts to a partial sort, and is quick and easy to implement on a 
microprocessor since only a small number of compare operations and data exchanges is required. 

Still further steps are taken in the present invnetion to maximize the number of characters which are 
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associated with shorter codes. This involves adjustment of the sizes of the sets of compression codes 50a, 
50b, 50c, i.e., the values of a, b, and c. In the disclosed embodiment, the values of b and c are functions of 
a. Initially, the array in the preferred embodiment is preset with a predetermined number a of characters of 
the alphabet being associated with four bit codes, sixteen times this number being associated with twelve 

5 bit codes, and the remainder of the alphabet of characters being associated with eight bit codes. The 
"encoding level" represented by the predetermined number a of characters associated with four bit codes, 
is dynamically altered as data statistics are accumulated, to reflect that characters having a high probability 
of occurrence should receive a four bit code in order to realize greater compression. Accordingly, steps are 
taken to periodically adjust the size of the sets of numbers of characters represented by four, eight, and 

w twelve bit codes. This involves the following steps: 

(1) For a given plurality k of items presented for encoding, 128 in the preferred embodiment, the 
character counts for predetermined groups of consecutive characters are summed to obtain a plurality of 
"group counts" 80 (Fig. 11). in the preferred method, these group counts or sums are obtained by breaking 
the 256-character alphabet into groups of sixteen, thereby providing sixteen groups. Thus, there are sixteen 

is group counts. 

(2) The group counts for groups toward the end of the array are then successively compared with the 
character counts (CC) for single characters at the beginning of the array, beginning with a comparison of 
the group count of the last group of characters to the character count for the first and most frequently 
occurring character. 

20 (3) The number of items a in the set of four bit codes 50a. which are the shortest codes, is increased 

until there is a correspondence between a group count and a character count, moving from the last group 
count toward the beginning of the array. In this manner, the size of the set of characters associated with the 
shortest digital codes 50a is increased to include more characters having a higher probability of occurrence, 
resulting in more optimum use of the shorter character codes (compare 50a in Fig. 11 with 50a' in Fig. 12). 

25 Essentially, this entails assigning more of the shorter codes to more frequently occurring characters and 
more of the longer codes to the less frequently occurring characters. In the preferred method, there are 
three sets of codes, and the set of a members associated with the four bit codes varies in size between 
zero and fifteen members, while the set of cmembers associated with the longest digital codes varies in 
size between zero and 240 members. 

30 As has been described, the data compression method in the preferred embodiment is effectuated with a 
program for the microcomputer 30. The following more detailed description of the preferred method for data 
compression may be implemented, as a series of program instructions for the microcomputer30 t and after 
the following discussion, those skilled in the art will understand how to program the microcomputer 30 to 
accomplish the objectives of the present invention. Reference will also be made to Figs. 3-16 in the 

35 discussion which follows: 

(1) First, an alphabet 45 of n characters must be selected to be employed in representing 
information. As discussed above, this alphabet of characters can be any predetermined set of n characters. 
In the preferred embodiment the set of ASCII characters is employed inasmuch as the data communications 
involving this character set is one primary area of usefulness for the present invention. The alphabet in the 

40 preferred embodiment comprises n = 256 eight bit ASCII characters. Such an alphabet 45 is shown in Fig. 
3, wherein a carat ( A ) preceding a character indicates an ASCII control character, a "v" preceding a 
character indicates an ASCII graphics symbol, and a plus (" + ") preceding a character indicates an italics 
ASCII character. 

(2) The characters are first arranged in an initial order array within the memory 35 in some 
45 predetermined order to form the table T1. In Fig. 3, the characters are arranged in order of the hexadecimal 

representation of the ASCII data set. Inasmuch as the preferred method disclosed herein is dynamically 
adaptive to changing data statistics, it is not necessary that an attempt be made to arrange the characters 
in any particular order. Nonetheless, in some applications it may be expected that certain types of data are 
more likely to occur than others, for example it may be known in some circumstances that English language 

so text is more likely to be encountered. In such applications, then an attempt may be made to order the 
characters in order of descending expected probability of occurrence, with the expected most frequently 
occurring character positioned at the beginning of the array and the expected least frequently occurring 
character at the end of the array. Such a prearrangement allows more rapid adaptation of the algorithm to 
the most likely data stream to be encountered. 

55 (3) Next, there must be provided a set of compression codes 60 to be used for encoding the 

characters of the alphabet in a compressed fashion. In the preferred embodiment, there are three sets of 
• compression codes used to encode characters: those with a four bit nibble 60a' t those with an eight byte 
60b, or those with twelve bits 60c (Appendix 111). Generally speaking, however, the compression codes have 
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z bits, and there is provided at least a first set of p-bit compression codes for encoding some of the 
characters of the alphabet and at least another set of r-bit compression codes used for encoding other ones 
of the characters of the alphabet In the preferred embodiment, a first set of p-bit compression codes is 
used for encoding a first predetermined number a of the n characters, a second set of g-bit codes are used 

5 for encoding a second predetermined number b of the n characters, and a third set of r codes are used for 
encoding a third predetermined c of the n characters. In this particular situation, a + b + c = n, since all of 
the characters must be represented one of the three different sets of codes. In the preferred embodiment, z 
can take on one of three values: p = 4, q = 8, and r = 12. 

It should also be understood that in general, there are at least two sets of compression codes of 

10 different lengths, wherein the first set of 50a of a characters have p-bit codes, and the second set 50c of c 
characters have r-bit codes. It just so happens that there are three sets of compression codes in the 
preferred embodiment although it will be understood that more sets may be used. 

The compression codes employed in one embodiment are illustrated in Appendices l-VI, for six 
exemplary compression levels. In these appendices, the columns labeled IN represents the position of a 

is character in the array of table T1. as indicated by a hexadecimal positional identifier ranging from 0 to FF. 
The compression codes 60 associated with the positions in the array are shown in the columns labeled 
OUT, and are hexadecimal representations also. It should be understood that in the preferred embodiment, 
the compression codes shown in the Appendices are computed in real time based upon the position of a 
presented character in the encoding table. The method for real time computation is discussed in detail 

20 below. However, the codes can be precomputed and stored in a memory in the form as illustrated in the 
Appendices. 

It should also be understood that these compression codes are not a static table of codes; rather, the 
codes are dynamically altered as a function of the compression level. Referring in this regard to Appendix I, 
it will be noted that this table represents a compression level wherein a = 1 t that is. only one character 
25 receives a four-bit compression code. This code is the hexadecimal 0 in the first OUT column. In 
accordance with the disclosed method, there are c =M6a = 16 codes of twelve bits. These 60c may be seen 
in the fast OUT column of Appendix I. and range from hexadecimalFFO to FFF. The remainder of the codes. 
b = n-a-c = 239 codes are eight bit codes, ranging from hexadecimal 10 in column 1 to FE in the last 
column. 

30 The compression level wherein a = 0 represents one extreme in the preferred embodiment In this case. 

all data is represented by eight bit compression codes, and no compression (or expansion) takes place. 
. This case is optimum for purely random data. Another extreme is wherein the compression level is a = 15, 

that is, there are sixteen members of the set of characters being associated with a four-bit compression 

code. This example is illustrated in Appendix VI. It" will be noted that there are a = 15 characters associated 
35 with four bit codes, ranging from 0 through E. This results in c= 16a = 16(15) = 240 characters being 

associated with twelve bit codes, ranging from F10 to FFF. and but one character being associated with an 

eight bit code, FO. 

A more frequently-encountered compression level is illustrated in Appendix V, wherein a = 12. Such a 
compression level will likely occur after the accumulation of some data statistics, and a pattern of recurring 

40 characters emerges in the data stream being processed. In this appendix, eight characters are associated 
with four bit codes, ranging from 0 to 8. There are c= 16a = 16(12) = 192 characters associated with twelve 
bit compression codes, ranging from F40 to FFF. This leaves 256-12-192 = 52 characters for eight bit 
compression codes, ranging from hexadecimal CO to F3. 

It will be observed from an inspection of the Appendices l-VI that the interpretation of any given 

45 compression code is a function of the compression level a. That is, given a compression level and up to 
twelve bits of data which represents a compression code, a position in the table (and also the correspond- 
ing character of the alphabet) is uniquely identified. In yet other words, all compression codes are 
instantaneously decodable. This is possible because the compression level, a, determines how many 
characters have four bit codes, and the values of a determines the values of b and c, the numbers of 

so characters having eight and twelve bit codes, respectively. Thus, if the compression level is 12, as in 
Appendix V, the first twelve hexadecimal numbers in the first OUT column 0 through 6 represent an entire 
code; if a C is encountered, it necessarily follows that at least one more hexadecimal digit follows, since the 
number of four bit codes has been exhausted after 8. In like manner, after the next 52 characters receiving 
eight bit codes have been exhausted, which occurs after the compression code F3 in Appendix V. it 

55 necessarily follows thata twelve bit code is expected, requiring three hexadecimal digits, F40 et seq. 

(4) The next step taken, after establishing these initial parameters, is to provide a sequential input 
information stream 70 represented by the characters of the alphabet to be encoded with the compression 
codes. 
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(5) For each character 70a. 70£>...of the input information stream provided sequentially for encoding, 
referred to as an item x. a number of steps are taken. These steps are as follows: 

(i) a compression code using the encoding table T1 is computed or chosen as a function of the relative 
position in the array of the character corresponding to the item x. Essentially, this involves indexing, 

5 hashing, or conducting a sequential search into the table T1 until a match is found between the character 
representing the item x and the same character in the Table Tl. This uniquely identifies a compression 
code. For example, and referring now to Fig. 3, the exemplary data stream "The Hayes Compression..." 
begins with the character T. It will be seen that this character occupies a position in table T1 in the. sixth 
column, the fifth character down, and corresponds to hexadecimal position 54 in the IN columns of the 

jo Appendices. Accordingly, for encoding of this character, the compression code associated with the 
compression level associated with the position hexadecimal 54 is selected or computed and provided as a 
compressed output. 

In the preferred embodiment, as has been mentioned, the compression codes are computed in real 
time by microcomputer 30. Computation in real time is preferable because without it there are sixteen 
ts possible compression code tables, and each would have to be stored in ROM or RAM, requiring 16 x 256 x 
8 = 4,096 bytes of storage. Alternatively a single compression code table could be provided, but it would 
have to be updated each time the value a changed. 

* The method for computing the compression code employed in the preferred embodiment is as follows. 

The position of the submitted data item x is found in the table T1. This allows ready identification of the 
20 compression level. If the position of x, called here p(x), is less than a, the compression code is the four bit 

nibble equal to p(x). For example, referring to Appendix IV, if the character occupies position4 in the first IN 

column, the compression code 4 is the appropriate result. 

If p(x) is greater than (255-1 6a), then a three-nibble code is sent, hexadecimal F followed by the eight 

bit number p(x). For example, with a = 8 in Appendix IV, a character occupying position AO in the sixth IN 
25 column receives the compression code position AO in the sixth IN column receives the compression code 

FAO. 

Otherwise, the two-nibble code p(x) + 15a is sent. For example, a character occupying position 60 in the 
fifth IN column in Appendix IV receives the compression code (60 + 15(8) = O 8 hexadecimal. AH of these 
computations are fast and easy to implement on a microcomputer since multiplication by sixteen is four 

30 shift-left operations, while multiplication by fifteen is four shift-lefts and subtract the original byte. 

In Rg. 3, the compression level is 4, which corresponds to the compression codes in Appendix ill. As is 
evident in Rg. 3, the character T results in the selection of a q bit code, which is provided as an output It 
will be understood that the table shown in Rg. 3 does not have the appropriate codes of Appendix lit shown 
therein. In Appendix ill, which is the table of codes for compression level four, the hexadeciaml code 

35 corresponding to the character position for 7" in the array of Rg. 3, which is represented by the hexadecimal 
characters 54, results in the computation or selection of the hexadecimal compression code 90. It will be 
understood that after the appropriate character position for an item to be encoded is found in the exemplary 
table T1 of Rg. 3, the corresponding compression code is computed, or alternatively selected from 
Appendix III. 

40 (ii) Tn© n sxt step taken for each item of the input information stream is to increment a character 

count associated with the character of the alphabet which represents the particular item of the input 
information stream. This character count, referred to as CC". comprises the vehicle for maintaining data 
statistics on the frequency of occurrence of the character. 

(iii) After the character count 00 has been incremented, it is compared with the character count 

45 associated with each character within a predetermined range higher, that is toward the beginning, in the 
array. This comparison is conducted up to a predetermined number d places in the array, where d is a 
pdetermined number less than n. In the preferred embodiment this comparison is conducted up to sixteen 
placed in the array and no more, primarily so as to not add an appreciable time delay resultant from 
processing time. Assuming a clock frequency of 2.5 MHz for the microcomputer 30, and the usage of the 

so CPIR instruction ("compare, incremend, and repeat", requiring at most 21 clock cycles), a search and 
comparison of up to sixteen memory locations can be conducted within about 135 microseconds employing 
the preferred Z80 microcomputer. Although other instructions must be executed for each character 
processed, the operations associated with the character counts comprise a significant portion of the 
processing time for each character. Accordingly, it will be appreciated that the 19,200 bits per second target 

55 bit rate (which corresponds to 1920 characters per second) for a modem employing the present invention, 
which corresponds to an average time of about 416 microseconds per eight bit character processed, will not 
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be impacted. Of course, if a faster microcomputer is employed the impact will be even less. 

Accordingly, in Fig. 3. the character count CO for the character 7 is compared to the character count 

for each character above the 7 in column six up to the character 0 in column five. 

(iv) When the character count CC is found to be greater than or equal to the character count for a 
5 character closer to the beginning of the array within the predetermined range of d characters, an exchange 

is performed. The relative positions in the array of the character corresponding to the item x and the - 3 

character lower in the array having the equal or lesser character count are then merely swapped in position. 

producing the result as shown in Fig. 4. Inasmuch as after the first character of the exemplary data stream. 

only the character 7 has a character count, the exchange results in a pairwise swap of the positions of the ? 
10 characters 7 and D. 

The effect of incrementing the character count in response to the occurrence of a character, and 
comparison of the character count to the character count for each character within a predetermined range 
lower in the array, is to provide a partial sort, so as to gradually order the probabilities of occurrence of 
particular characters in descending order. Although the partial sort is only within a predetermined range and 
rs not a complete sort, the fact that the comparison is done for each character results in a gradually sorted 
table. Moreover, the more frequently a character occurs, the more frequently it is repositioned within the 
table, so that the effective sorting rate for these characters is greater. This of course provides the desired 
result of a more rapid approach to use of a four bit compression code for those characters having higher 
probabilities of occurrence. 

20 It will now be appreciated that the foregoing actions are taken for each character presented for 
encoding. Of course, it will be understood that in Fig. 4, after the pairwise exchange illustrated therein, the 
character 7, now being associated with position five in the fifth column, is now at a position corresponding 
to IN position 44 in Appenxid II! and receives the OUT compression' code 80, which will be be provided on 
the next occurrence of the character 7. The character D now is associated with the fifth position in the sixth 

25 column, corresponding to the position 54 in Appendix III, and receives the compression code 90. 

Referring next to Figs. 5 and 6, the states of the table T1 after the second and third characters of the 
exemplary data stream may be seen. After the second character in the exemplary data stream, which is an 
/?, the character count associated with the h is incremented. When comparing the state of the character 
counts after the processing the h, it will be observed that the count for h is greater than the count for W. 

30 resulting in the exchange of positions as illustrated in Rg. 5. In a similar manner, when processing the third 
character in the exemplary data stream, an e, the count for e is incremented and compared to the count for 
U in Fig. 5. Since the count for e is higher than the count for tf, the positions of the e and the U are 
exchanged, resulting in the placement as illustrated in Fig. 6. 

Referring next to Figs. 7 and 8, the conditions of the table T1 before and after the end of the first line of 

35 the exemplary data stream are illustrated. It may be seen form an inspection of these figures that data 
statistics have begun accumulating in a significant manner, and that a significant shifting of character 
positions has taken place. For example, it will observed that by the end of the first line of the exemplary 
data stream, the space character sp is the most frequently occurring character, having occurred nine times 
on the first line. The charactero is the next most frequent character, having occurred six times. It will also 

40 be noticed between Figs. 7 and 8 that the character a, the last character on the line, continues to move 
toward the beginning of the array in the manner described as its character count increases. 

By the end of the second line, as illustrated in Figs. 9 and 10, the table T1 has been significantly 
sorted, with the characters now arranged in an apparent order of decreasing probability of occurrence. It 
should be recalled that at the point of Figs. 9 and 10, the initial encoding level of four, wherein only four of 

45 the most frequently occurring characters are encoded with four bits, still remains. It will thus be understood 
that only the characters, sp, s, e. and f are encoded with four bits, while virtually all the remaining 
characters which have occurred in this exemplary data stream are encoded with eight bits. 

Next will be provided a more detailed discussion of the steps taken in the preferred embodiment to * 
change the encoding level periodically. The term "encoding level", it will be recalled, means simply the 

so value of the number a of characters which are associated with the shortest code. Changing the encoding 

level allows the maximization of the numbers of characters associated with shorter codes. Initially, the array * 
is preset with a predetermined number a of four bit codes. b = 16a twelve bit codes, and the remainder of 
the alphabet, c = n-a-d, being associated with eight bit codes. The encoding level is dynamically altered as 
data statistics are accumulated, to reflect that the more characters there are that have a high probability of 

55 occurrence, the more characters should receive a four bit code. 

Accordingly, steps are taken to periodically adjust the size of the sets of numbers of characters 
represented by four, eight, and twelve bit codes. In the preferred embodiment, this periodic adjustment 
occurs after a predetermined number k characters where k is 128 in the preferred embodiment. However, it 



12 



0 283 735 



is not necessary that the adjustment of the encoding level be done after a predetermined number of 
characters or after a predetermined time; rather, the important concept is that during the actual compres- 
sion and/or transmission or storage of data, the statistics pertaining to the data being compressed are 
analyzed so that the encoding scheme may dynamically adapt to the possibility that data statistics change, 
s This periodic adjustment could occur as a function of time as easily as a function of the number of 
characters being processed, or in any other mannder during the processing as may occur to those skilled in 
the art. 

In the preferred method, the encoding level is changed by adjusting the relative values of the numbers 
a, b, and c; the numbers b and c are functions of a. The preferred steps for adjusting the encoding level 
w are as follows: 

(i) First, the character counts for consecutive groups of characters are obtained, yielding a plurality of 
"group counts". In the preferred method, consecutive groups of m characters are summed to obtain a 
plurality of mm = 256/16 = 16 group counts. As may be seen in Fig. 11, the group counts correspond in 
the preferred method to sums of columns of 16 characters, when the array is viewed two dimensionaliy 

?5 rather than linearly. 

(ii) The group count for each group, beginning at the rightmost position in table T1, is then compared 
to successive character counts beginning at the top of the array. In other words, the rightmost group count 
in Fig.11 is compared to the character count for the most frequently occurring character sp. the next 
rightmost group is compared to the next most frequently occurring character e, and so on. (Note: the group 

20 counts are zero for all groups until the tenth column from the right in Fig. 11, so these group counts are not 
all shown). Mathematically stated, this comprises comparing the group count for group (n/m)-M to the 
character count for the character /, where / is an integer index of group counts beginning at zero, and where 
/ is an integer index into the array beginning at zero at the beginning of the array. It will thus be understood 
that there are 0-1 5 groups. 

25 (iii) In the event that the group count being compared to the character count is less than the 

character count, then the next column or group count moving leftward is compared to the next character 
count. Stated more precisely, if the group count for the group (n/m)-M is less than the character count for 
character ] t then / and / are incremented and the comparison step is repeated. The steps of comparison 
and incrementing are repeated until there is a correspondence between a group count and a character 

30 count. 

(iv) When the group count matches the character count for a particular character, then the 
comparison stops, and an encoding level is established as a function of the number of comparisons made 
until correspondence was found. For example, it may be seen in Fig. 1 1 that the group count for the eighth 
column from the right which is zero, equals the character count for the character a h. At this point, it will 
35 be observed that the frequency of occurrence for the characters which follow the A H is approximately the 
same as the frequency of occurrence of any character within the group being compared. Thus, in the 
preferred method, the comparison stops at this point, and an encoding level is established on position 
immediately preceding the point at which there was a correspondence between the group count and the 
character count. 

40 In other words, and referring now in this regard to Fig. 12, a new encoding level of a =8 characters is 
established since the character A M, the last character in the set of characters which now receive four bits 
coding, is one character lower in the array than the point of correspondence between the group count and 
character count shown in Fig. 11. Note further in Fig. 12 that the positional exchange between characters 
continues after the level calculation, as it will be noted that the character A J, which in ACSII represents a 

45 line feed character, has exchanged places with the character A H, 

Still further steps are taken, as shown in Fig. 12, in response to the establishment of a new encoding 
level of a =8. Firstly, it will be noted that the computation of a results in c = 16a = 128 characters now 
receiving twelve bit compression codes, and b = 256 -a -c = 120 characters now receiving eight bit 
compression codes. 

so Furthermore, it will be noted in Fig. 12 that the character counts have been divided in half (with 
downward rounding) after the level recalculation. Rather than completely resetting the table T1 at the 
beginning of a new set of k characters, and rather than maintaining all of the prior history of the preceding k 
characters, in the preferred method only some of the prior history is preserved. This is accomplished by 
halving the character counts at the beginning of each new set of k characters, which serves the dual 

55 function of preserving some prior history of the preceding characters while not biasing the table so heavily 
that the speed of adaption to changing data statistics is inhibited. Although the preferred method involves 
halving the character counts, those skilled in the art will understand and appreciate that the amount of prior 
history preserved and the speed of adaptation of changing data characteristics can be adjusted to a degree 
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by the amount by which the character counts is reduced. Reducing the character counts by half is 
particularly convenient to execute in the preferred embodiment since ait that is required is a shift operation 
performed at each memory location containing a count, a task which is simple to implement with the 
preferred Z80 microcomputer. 

s Figs. 15 and 16 illustrate the second level recalculation, which is taken after the next 128 characters. 

that is, after the word in on line 5 in the exemplary data stream. It will be observed in Fig. 15 that when 5 
comparing the group counts, beginning in the right of table T1 and moving leftwardly, to the character 
counts, beginning at the most frequently occurring character sp and moving downwardly in the array, there 
is a correspondence between character counts and group counts for the character / (with a character count ^ 

jo of four) and the group three (the fourth group from the left; the group count is four). This results in the 
establishment of a new encoding level of a = 12 characters, such that all characters lower than a m in the 
first column of Fig. 16 thereafter receive four bit compression codes, selected according to Appendix V. 
Correspondingly, 192 characters receive 12 bit compression codes, while the remaining 52 characters are 
encoded with eight bits. 

75 In order to provide for decompression, basically all that is required is to maintain a parallel decoding 
table T2 at the receiving end, such as in modem 15 in Fig. 1. In order to decompress encoded data being 
received, the decoding table begins with an identical initial table such as set forth in Fig. 3. Then, the exact 
same steps as described above for maintaining the encoding table T1 are followed on an item by item basis 
as for the decoding table T2, with the exception that the item being processed is an encoded character, and 

20 the objective is to select and provide the appropriate character of the alphabet as a decoded output 

The method of decompressing data compressed in accordance with the present invention is typically 
performed at the other end of the communications link. A parallel second alphabet of n d characters 
duplicative of the first alphabet is provided, when n tf is the same as the number n of characters in the first 
alphabet, with the subscripted "d w signifying that the operation is performed at the decoding or de- 

25 compressing apparatus. The n rf characters of the second alphabet are arranged in an initial ordered array 
identical to the array of the first alphabet. A sequential data stream encoded with the compression codes is 
received over the communication link. A character in the alphabet corresponding to the compression code 
is then looked up. The output provided is a character of the alphabet corresponding to the compression 
code. Finally ,the order of the array for the second alphabet is maintained in the manner as set forth above 

30 for the alphabet in the transmitting or encoding modem. Accordingly, as the frequency of occurrence of 
characters in the input information stream being encoded changes, the more frequently appearing cor- 
responding characters in the second alphabet tend to migrate toward the beginning of the decoding array in 
parallel with the encoding array. 

In the preferred embodiment, this entails usage of an inverse algorithm to the one employed for 

35 computing the compression code. The inverse algorithm, which is embodied as a program for the 
microcomputer at the receiving end, will be described next. Four bit nibbles of data are the minimum packet 
size; N1 refers to the first nibble received, N2 the second, and N3 the third. Moreover, let the notation N1N2 
mean the value 16'N1 + N2, for brevity. Then, the steps taken for decoding are as follows: 

If N1 is less than a, the current compression level, the output character is the character in the position 

40 N1 in the table T2. Otherwise, N1 is saved and the system awaits the arrival of N2. 

If N1N2 is less than 256-a, then the output character is the character in the position (N1N2-I5'a) of 
table T2. Otherwise, N1 and N2 are saved, and the system awaits the arrival of N3. 

If a third nibble is received, N1 is discarded (which is always an f). and the character in the position 
N2N3 is provided as the decompressed output. 

45 Thus, as data is presented for decoding, a character is selected from the decoding table T2 which 
corresponds to the particular digital code represented by the presented item of data. The selected character 
in the decoding table T2 is provided as a decoded output. In the same manner as for the encoding table, 
periodically the spatial relationship of a codes within the array and the characters of the alphabet are * 
adjusted as the function of the frequency of occurrence of the characters, after the characters have been 

so decoded. Accordingly, and in the same manner as for the encoding table, as the frequency of occurrence of 

the decoded characters over a plurality of characters changes, the more frequently occurring characters ^ 
become associated with the shorter codes in parallel with the codes in the encoding table. 

It will be appreciated that the same method must be performed by the decoder as is performed by the 
encoder, and in synchronism, in order for proper operation. In order to ensure synchronized operation, in 

55 the preferred embodiment, the following sequence of operations is performed by the encoder and the 
decoder: (1) a character is encoded according to table T1 and transmitted by a transmitting modem, (2) the 
transmitting modem updates its table T1, (3) the receiving modem decodes the received character 
according to its then-current table T2, and (4) the receiving modem updates its table T2, 
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Additional means are provided for resetting, to ensure that operation by the encoder and decoder begin 
in synchronism. In the preferred embodiment, one of the seldom-used ASCII character codes is employed 
as a transparent "flag" or "escape" character while provides a plurality of control codes. One of these 
codes is use for reset. During set-up. the transmitting modem resets its table to a known initial state, and 
5 then transmits the reset character. This causes the receiving modem to reset its table T2 to the same initial 
state as Table T1 . 

For resetting and for other reasons, it is necessary to send control information from the data 
compression circuit to the decompression circuit. In the preferred embodiment control information is send 
using a "flag" byte; hexadecimal FA is used in the disclosed embodiment. This byte is injected into the 
w output stream when control functions are necessary. This "flag" byte is always followed by a "flag code" 
byte, which carries the control information. A "flush" operation, to be described next, must be done before 
a flag byte is injected. 

In the preferred embodiment, three flag codes are provided, although more or fewer could be 
employed. These three include a "no-op" (no operation) code, a "D-flag" code, and a "reset" code. The 

15 "no-op" code is self explanatory; it is used in the flush operation. The "D-flag" is a command to keep the 
flag byte in the data stream. About one out of every 256 bytes in a typical data stream will by chance be 
the byte defined as the flag byte. To keep data from being last, the compression circuit must recognize 
these bytes and inject a O-flag after each to tell the decompression circuit to leave the flag byte in the data 
stream. The reset code of course is used to reset the tables to an initial state. 

20 It will of course be understood that the flag byte is optional in the present invention. Having a special 
flag byte increases the amount of data transmitted by roughly 0.4%, but the functionality it adds is 
convenient 

The preferred method further comprises steps for obtaining additional compression on occasions when 
strings of repeated characters occur within the data to be transmitted or stored. Those skilled in the art will 

25 understand that redundant data frequently occurs in strings of repeated characters, particularly in the 
communications of certain types of information files. For example, certain spreadsheet files frequently have 
imbedded therein strings of null or zero characters representing blank areas in the spreadsheet 

One conventional compression method used for handling strings or repetitive characters comprises the 
transmission of a "flag" character, signifying the entrance of a repeat state, followed by the character to be 

30 repeated, followed by a number indicative of the number of repeated instances of the particular character. 

Additional compression is realized in the novel repeat state employed in the preferred embodiment. In 
this method, a "repeat state" is automatically entered at the encoder and at the decoder after three 
consecutive occurrences 90 of the same character. Following the three identical characters will always 
occur a signal or indicium 91 indicative of either a number or a "additional repeat state". Then, no more 

35 data is transmitted until the end of the string of repeated characters, or until fifteen more repeated 
characters have been received, or until a timing signal cuases the pending data to be "finished up" when in 
a streaming mode. 

In the preferred method, a four bit "count nibble" is employed to represent the indicium 91 representing 
the number of occurrences of the repetitive characters. A hexadecimal representation of the four bit nibble 

40 may vary between 0 and F. The F represents both fifteen additional characters and the continuation of the 
repeat state, meaning that an additional count nibble follows the present count nibble. 

For example, and referring now to Fig. 17A, it may be seen that an exemplary input data stream 
includes a string of three repetitive e's. When three repetitive characters have occurred, the repeat state 90 
is automatically entered at the encoder and the decoder, and a count nibble indicium 91 Is the expected 

45 next character. The compressed data stream transmitted therefore comprises three e's 90 followed by a 
repeat indicium 91 which represents the number of repeats, zero to fifteen, which follow the initial three 
characters. If the number is fifteen (hexadecimal F), the repeat state continues; otherwise it terminates in the 
receiver after recreation of the indicated number of repeated characters. In the case of Rg. 17A, there are 
but three repeated characters in toto, so the repeat indicium is zero. 

so As another example, refer now to Fig. 17B. After the repeat state is entered, it may be seen that there 
are three additional e characters subsequent to the initial three e's. Accordingly, the compressed data 
stream transmitted comprises three e's, followed by a repeat indicium 91 of 3, indicating that three 
additional e's follow the initial three e's. 

Up to fifteen additional characters may be represented by a single repeat indicium or code. Beyond this 

55 number, another repeat code is called for. In the preferred method, the hexadecimal character F is an 
indication that fifteen repeats occur, and that an additional repeat indicium is expected. Note in Rg. 17C 
that after the repeat state is entered for the first three e characters, there are an additional fifteen e 
characters. Accordingly, the compressed data stream transmitted is threee's, followed by a repeat indicium 
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91a ofF (which indicates that fifteen additional occurrences of the character e occur), followed by a second 
repeat indicium 91 6 of 0. The zero indicates of course that there are no additional occurrences of the 
repeat character after the string of additional fifteen. 

In Fig. 170, it may be seen that a string of eighteen additional characters follows the initial three 

5 characters. This produces a compressed data stream of the initial three e's, followed by a first repeat 

indicium of F signifying the first fifteen characters and that a second repeat indicium follows, followed by a j 
second repeat indicium of 3, indicative of the final three characters in the stream. 

Finally, and referring to Fig. 17E, it should be understood that consecutive occurrences of repeat indicia 
may be employed for strings as long as may be desired. In the example of Fig. 17E. a string of thirty-two 5 

10 additional characters follows the initial three characters, tn the preferred method, this data stream is 
represented by three e's to trigger the repeat state, a first repeat indicium of F to indicate the first fifteen 
characters and to signify that and additional repeat indicium follows, a second repeat indicium of F to 
indicate the next fifteen characters and to signify that an yet another repeat indicium follows, plus a final 
repeat indicium of two to pick up the final two characters in the total of thirty-two. 

is It will now be appreciated that the preferred method for compressing an input data stream having a 
string of m repetitive characters comprises entering a repeat state by detecing the occurrence of 
predetermined number n instances of repetitive characters, transmitting data representing the predeter- 
mined number n repetitive characters, and immediately thereafter, transmitting a coded first indicium 
representing a number o of occurrences of the repetitive character subsequent to the n instances. In the 

20 preferred method disclosed herein, the sum of n and o will equal m. 

It will be understood that the method of decompressing data compressed in accordance with the 
described method of handling repetitive character strings comprises similar but reverse steps. The steps 
include (1) detecting a repeat state in received data by detecting the occurrence of the same predetermined 
number n instances of repetitive characters. (2) providing n decoded characters representing the predeter- 

25 mined number n repetitive characters, (3) following the n repetitive characters, receiving the coded indicium 
representing the number o of occurrences of the repetitive character subsequent to the n instances, (4) 
followed by providing o decoded characters subsequent to the n characters. Of course, the sume of n ando 
will also equal m in the decoder as well as in the encoder. 

It will be further understood that in the present invention, the repeat indicium represents a number 

30 between zero and a predetermined number p characters, fifteen in the disclosed embodiment, it should also 
be understood that should there be more than fifteen characters in addition to the first three in the string, an 
additional coded indicium is employed to signify the presence of additional characters. For example, in the 
preferred method, in the event that a single coded indicium is insufficient a plurality of coded indicia are 
employed to represent the total number of repeat characters. When the coded indicium is fifteen, it is also a 

as signal that an additional four bit count nibble or indicium follows, indicative or an additional number of 
repeated characters. This number can be again a number between zero and p. As in the example of Fig. 
• 17E, with a total string of thirty-five repeat characters, there is received by the decoder the first three 
characters representing the entry into the repeat state, a first repeat indicium F representing fifteen 
characters and indicating that an additional indicium follows, a second repeat indicium of F representing 

40 fifteen characters and indicating that still another repeat indicium follows, followed by a repeat indicium of of 
two, for a total of three repeat indicia. 

It should be understood that the preferred method for repetitive character compression just described is 
also provided in the preferred embodiment as a program for the Z80 microcomputer 30 of Fig. 2. 

The preferred embodiment of the present invention also is capable of an improved streaming mode of 

45 operation. Since bytes of data can be converted into an odd number of nibbles, there will be occasions 
where the last character in a string of characters is not completely transmitted because the last nibble is 
waiting for a second nibble to be added so that a complete byte can be sent. Accordingly, in the preferred 
embodiment a timer circuit or function is provided. This timer watches the duration of inactive periods and. 5 
after a predetermined period of inactivity, say 15 milliseconds, flushes unsent data out of the compression 

so circuit. The timer function is most conveniently implemented using the interrupt drive dedicated 

timer/counter circuit 33 (Fig. 2). * 

The method for the flush is as follows. These steps are performed in response to the indication that a 
predetermined time period of inactivity has occurred. If the system is in the repeat state, the "count nibble" 
is sent and the repeat state is terminated. Next, if there is an odd nibble of data remaining to be 

55 -transmitted, it is combined with the nibble hexadecimal F and sent as a complete byte. It should be noted in 
this regard that F is the only nibble that never stands alone as a single-nibble compression code. To 
prevent the extraneous F from being interpreted by the. receiver/decompressor as the first nibble of a new 
compression code for a new character, a flag byte is sent immediately. If the flush operation was triggered 
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by the timer, the flag byte is followed by the "no-op" code, indicating the receiver is to take no action other 
than discard the last F nibble, the flag byte, and the no-op byte. If the flush was due to a reset operation, 
the flag byte would be followed by the reset code. 

While it may appear that the flush operation slows down the data throughput rate since it can add two 

5 and a half bytes to the data stream, it should be remembered that it only takes place when the channel is 
idle for 15 milliseconds. Moreover, transmission of 2,5 bytes of data at 9600 bits per second (synchronous) 
takes only 2.8 milliseconds. 

It will now .be understood and appreciated that the disclosed embodiment of the present invention 
possesses several dynamic characteristics and advantages which make it suitable for a wide variety of 

10 compression applications. One such "dynamic" is the usage of a plurality of sets or alphabets of 
compression codes. In other words, each of the different compression levels, there being fifteen in the 
preferred embodiment, six of which are shown in Appendices l-VI, represents a different compression 
encoding scheme. Periodically, after accumulating data statistics, a decision is made as to which of these 
fifteen sets is best suited for the present data being processed. It will be understood that compression 

is levelO is best suited for random data (with no compression), while compression level 75, having the most 
four-bit characters, is best suited for highly regular or repetitive data. Levels 8-9 are believed to be the most 
suitable for English language text. 

Another advantage is is the fact that these accumulated data statistics are not static. Only a portion of 
the prior history of the data already processed is retained when evaluating the statistics. This allows rapid 

20 adaptation to changing data types and characteristics, while still retaining the advantages of compression. 

Yet another advantage is the flexibility to adapt to increase the compression level, and hence the 
degree of compression, in response to changing frequencies of occurrence. For example, consider the case 
where but four characters of a data set are occurring so frequently as to receive four-bit codes. If the nature 
or type of data changes so that two other characters begin to occur frequently, the compression level will 

25 be changed to assign six characters a four-bit compression code. 

Still another advantage is the speed of adaptability. Because sort operations conducted with a computer 
are relatively time-consuming, the present invention employs a partial bubble sort, conducting pair-wise 
exchanges of character positions within a limited predetermined number of characters from a given 
character within the memory array used for the tables. This uses fewer computer instructions, and results in 

30 a satisfactory overall sort rate suitable for real time applications such as a modem. The tables may not be 
completely sorted according to descending order of probability at any given time, but the more frequently 
occurring characters are processed and partially sorted more often, and move up to the top of the array 
quite rapidly. 

Stiil another advantage is the use of half-bytes for compression codes rather than arbitrary bit lengths. 
35 Such an implementation is far easier to implement on commercially available microcomputer circuits, and 
thereby results in faster' operation since variable word lengths need not be dealt with. 

Yet another advantage realized in the repetitive character string compression method is the fact that an 
escape character need not be transmitted or even employed as a repeat character, as in prior art systems 
wherein one of the set of possible characters must be dedicated for use as a repeat character. This frees 
40 up an additional character of the alphabet of characters, which may be important in applications wherein the 
alphabet of different characters is heavily utilized. 

Moreover, since the repeat state is automatically entered after a predetermined number of characters, 
additional savings in processing time are realized. In some prior art arrangements, processing time must be 
devoted to determining the number of repeated instances occur in the string; only then is the repeat 
45 character and the number of repeats transmitted. In the present invention, the predetermined number of 
characters is transmitted before entering the repeat state, requiring less processing in order to determined 
the number or repeats after the initial predetermined number of characters. Accordingly, the data flow is not 
interrupted as much as in prior art approaches. 

Last but not least, another advantage of the present invention is of course the degree of compression 
so realized. Using the preferred method for data compression described above, data compression ratios as low 
as 0.546 have been realized in test data of text files. It is believed that 30% to 40% reductions are realized 
with the 1-2-3 nibbles compression code method, while an additional file size reduction of 2% to 10% are 
realized with addition of the repeat character string method. 

Finally, it will be understood that the preferred embodiment of the present invention has been disclosed 
55 by way of example and that other modifications may occur to those skilled in the art without departing from 
the scope and the spirit of the appended claims. 
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APPENDIX III 
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APPENDIX V 



Compression Level = 12 
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APPENDIX VI 
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so Claims 

1. In a communication system including an encoding modem (10) and a decoding modem (15), each of 
the modems including a memory (35) for storing data, a controller (30). a telephone interface circuit (24), 
and a computer interface circuit (22), a dynamic data compression method CHARACTERIZED BY the steps 

55 of: 

(1) providing an encoding table (T1) comprising a plurality of digital codes (60a ( 60b) associated with 
characters of an alphabet (45). some of the digital codes (60a) being shorter than others (60b); 

(2) presenting for encoding an item of data (70) represented by one of the characters of the alphabet 
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(45); 

(3) in response to the presented item of data, selecting a code (60) from the encoding table which 
corresponds to the particular character of the alphabet represented by the presented item of data (70); 
(4) providing the selected code (60) as an output; and 
5 (5) periodically adjusting the corresponding relationship of the codes (60) and the characters of the 

alphabet (45) in the encoding table as a function of the frequency of occurrence (CC) of characters of the ? 
alphabet over a plurality of characters presented for encoding, 

whereby as the frequency of occurrence of characters over a plurlity of characters presented for 
encoding changes, the more frequently occurring characters become associated with shorter codes in the ? 
w encoding table. 

2. A data decompression method for decompressing data compressed by the method of Claim 1, 
FURTHER CHARACTERIZED BY the steps of: 

(1) providing a decoding table (T2) in parallel to the* encoding table (T1) used for compression, said 
decoding table having a plurality of digital codes (60) associated with the characters, of the alphabet in the 
is manner of the encoding table, some of the digital codes (60a) being shorter than others (606); 

(2) presenting for decoding an item of coded data represented by one of the digital codes (60); 

(3) in response to the presented item of coded data, selecting a character (65) from the decoding table 
which corresponds to the particular digital code (60) represented by the presented item of coded data; 

(4) providing the selected character (65) as a decoded output; and 

20 (5) periodically adjusting the corresponding relationship of the codes and the characters of the alphabet 
in the decoding table as a function of the frequency of occurrence (CC) of characters of the alphabet over a 
plurality of characters resultant from decoding, 

whereby as the frequency of occurrence of characters over a plurality of characters resultant from 
decoding changes, the more frequently occurring characters become associated with shorter codes in the 

25 decoding table in parallel with the encoding table. 

3. The method of Claim 1 FURTHER CHARACTERIZED IN THAT the characters in the alphabet (45) 
and their corresponding codes (60) are arranged in an initial ordered array, with shorter codes (60a)toward 
the beginning of the array and longer codes toward the end of the array, and FURTHER CHARACTERIZED 
IN THAT the step of periodically adjusting the corresponding relationship of the codes and the characters of 

so the alphabet in the encoding table comprises the steps of: 

(6) maintaining a character count (CC) associated with each character of the alphabet; 

(7) incrementing .the particular character count (CC1) associated with the particular character, 
represented by an Hem of data presented for encoding after selecting a code from the encoding table; 

(8) after incrementing the character count (CC1), comparing the particular character count to the 
as character count for a preselected character (CC2) at a position in the array more likely to be associated with 

a shorter code; and 

(9) if the particular character count is greater than the character count for the preselected character, 
exchanging the relative positions in the array of the character just encoded with the preselected character 
so as to associate the character having the larger character count (CC1) with a position in the array more 

40 likely to be associated with a shorter code. 

4. The method of Claim 3, FURTHER CHARACTERIZED IN THAT the preselected character is 
positioned a predetermined number, of positions in the array toward the beginning of the array from the 
particular character. 

5. The method of Claim 4, FURTHER CHARACTERIZED IN THAT the predetermined number of 
4S positions is sixteen. 

6. The method of Claim 1. FURTHER CHARACTERIZED IN THAT the encoding table comprises a 
plurality of sets of digital codes (50a. 506, 50c), each of the sets comprising digital codes of the same 
predetermined length within each set but different between sets, with the sets being arranged in an initial » 
ordered array associated with the characters of the alphabet beginning with a set having shorter codes 

so (50a) toward the beginning of the array and a set having larger codes (506)toward the end of the array. 

7. The method of Claim 6, FURTHER CHARACTERIZED IN THAT each of the sets of codes (50a, 506, ? 
50c) are initially preset at a predetermined size, and further comprising the step of periodically adjusting the 

sizes of the sets, whereby the numbers of characters represented by different digital codes changes as a 
function of the frequency of occurrence of characters in the data presented for encoding. 
55 8. The method of Claim 7, FURTHER CHARACTERIZED IN THAT the step of periodically adjusting the 
sizes of the sets comprises the steps of: 

(1) summing the character counts for predetermined groups of consecutive characters to obtain a 
plurality of group counts (80); 
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(2) comparing the group count (80a) for groups towards the end of the array with the character counts 
(CCa) for characters towards the beginning of the array; 

(3) increasing the size of the set of the shortest digital codes (50a) until there is a correspondence 
between a group count and a character count at the end of the set of shortest digital codes, 
s whereby the size of the set of the shortest digital codes is increased to include more characters having 
a higher probability of occurrence than any the characters in the set of codes associated with the group 
counts. 

9. The method of Claim 8, FURTHER CHARACTERIZED IN THAT there are at least three sets of digital 
codes (50a, 506, 50c). and wherein the set associated with the longest digital codes (50c) increases in size 

/o in increments of the size of the groups while the set associated with the shortest codes (50a) increases in 
size in increments of a single characters. 

10. The method of Claim 9, FURTHER CHARACTERIZED IN THAT the set associated with the shortest 
digital codes (50a) varies in size between 0 and 15. 

11. The method of Claim 9, FURTHER CHARACTERIZED IN THAT the set associated with the longest 
/5 digital codes (50c) varies in size between 0 and 240. 

12. The method of Claim 9. FURTHER CHARACTERIZED IN THAT the shortest digital code is 4 bits, 
and the longest digital code is 12 bits. 

13. The method of Claim 9. FURTHER CHARACTERIZED IN THAT the set associated with the shortest 
digital codes (50a) initially has 4 members and the set associated with the longest digital codes (50c) 

20 initially has 64 members. 

14. The method of Claim 8 t FURTHER CHARACTERIZED IN THAT the step of periodically adjusting 
the sizes of the sets is repeated after the occurrence of a predetermined number of items of data presented 
for encoding. 

15. The method of Claim 14, FURTHER CHARACTERIZED IN THAT the predetermined number of 
25 items between adjustments to the size of the sets is 128. 

16. The method of Claim 8, FURTHER CHARACTERIZED IN THAT the predetermined groups of 
consecutive characters have sixteen characters. 

17. The method of Claim 7, FURTHER CHARACTERIZED IN THAT the step of periodically adjusting 
the sizes of the sets is taken after every k items of data presented for encoding. 

30 18. The method of Claim 17, FURTHER CHARACTERIZED IN THAT k is 128. 

19. The method of Claim 7, FURTHER CHARACTERIZED IN THAT there are n characters in the 
alphabet (45). wherein the set associated with the shortest digital code (50a) has a members, and the set 
associated with the longest digital code (50c) has c members, and FURTHER CHARACTERIZED IN THAT 
the step of periodically adjusting the sizes of the sets comprises changing the relative values of the 

35 numbers a and c by taking the following steps: 

(i) summing the character counts (CC) for consecutive groups (80) of m characters to obtain a 
plurality of n/m group counts; 

(ii) comparing the group count (80) for group (n/m)-l -1 to the character count for character J, where / 
is an integer index of group counts beginning at zero and j is an integer index into the array beginning at 

40 zero at the top of the array; 

(iii) if the group count (80) for group (n/m)-/-1 is less than the character count for character J then, 
incrementing / and / and going back to the step (ii); 

(iv) if the group count (80) for group (n/m)-/-1 is greater than or equal to the character count for 
character;, then establishing an encoding level of a =7 + 1 by assigning / + 1 of the shortest digital codes, 

45 and c = (nfm) x (/ + 1 ) of the longer digital codes. 

20. The method of Claim 19, FURTHER CHARACTERIZED IN THAT there is a set of characters having 
digital codes sized in between the shortest digital code and the longest digital code and having b members, 
and wherein b = n -(/ + 1 ) -((nfm ) x (/ + 1 )). 

21. In a data processing, storage, or communication system wherein data is to be compressed for 
so processing, storage or communication, including a memory means (35) for storing data, a data processor 

(30) for manipulating data, an input circuit (22) for receiving an item of data (70) for encoding, and an output 
circuit (24) for providing an output; a dynamic data compression apparatus, CHARACTERIZED BY: 

said memory means storing an encoding table (T1) comprising a plurality of digital codes (60) 
associated with characters of an alphabet (45), some of the digital codes (80a) being shorter than others 
55 (606); 

said item of data (70) being represented by one of the characters of the alphabet (45); 
said data processor (30) being responsive to the presented item of data (70) for generating a code (60) 
which corresponds to the particular character of the alphabet represented by the presented item of data; 
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said output circuit (24) providing the generated code (60) as an output; and 

said data processor (30) periodically adjusting the corresponding relationship of the codes (60) and the 
characters of the alphabet in the encoding table in said memory means as a function of the frequency of 
occurrence of characters (CC) of the alphabet over a plurality of characters presented for encoding, 
5 whereby as the frequency of occurrence of characters over a plurality of characters presented for 

encoding changes, the more frequently occurring characters become associated with shorter codes in the > 
encoding table, 

22. A data decompression apparatus for decompressing data compressed by the apparatus of Claim 

21, including a second memory means (35) for storing data, a data processor (30) for manipulating data, an ? 
w input circuit (24) for receiving a coded item of data (60) for decoding and an output circuit (22) for providing 

a decoded output output, CHARACTERIZED BY: 

said second memory means (35) storing a decoding table (T2) in parallel to the encoding table <T1) 

used for compression, said decoding table having a plurality of digital codes (60) associated with the 

characters of said alphabet (45) in the manner of the encoding table, some of the digital codes (60a) being 
15 shorter than others (606); 

said data processor (30) being responsive to the presented item of coded data (60) for selecting a 

character (65) from the decoding table (T2) in said memory means which corresponds to the particular 

digital code represented by the presented item of coded data; 

said output circuit (22) providing the selected character as a decoded output; and 
20 said data processor (30) periodically adjusting the corresponding relationship of the codes (60) and the 

characters of the alphabet in the decoding table in said second memory means as a function of the 

frequency of occurrence of characters (CC) of the alphabet over a plurality of characters resultant from 

decoding, 

whereby as the frequency of occurrence of characters over a plurality of characters resultant from 
25 decoding changes, the more frequently occurring characters become associated with shorter codes in the 
decoding table in parallel with the encoding table. 

23. The apparatus of Claim 21. FURTHER CHARACTERIZED IN THAT the characters in the alphabet 
and their corresponding codes are arranged in an initial ordered array in said memory means, with shorter 
codes (60a) towards the beginning of the array and longer codes (60b) toward the end of the array, and 

30 wherein said data processor (30) adjusts the corresponding relationship of the codes and the characters of 
the alphabet in the encoding table in said memory means and FURTHER CHARACTERIZED BY: 

maintaining a character count (CC) associated with each character of the alphabet (45) in said memory 
means (35); 

said data processor (30) incrementing the particular character count associated with the particular 
35 character represented by an item of data presented for encoding after providing a coded output; 

said data processor (30) comparing the particular character count to the character count for a 
preselected character at a position in the array more likely to be associated with a shorter code after 
incrementing the character count and 

said data processor (30) being responsive to the particular character count being greater than the 
40 character count for the preselected character for exchanging the relative positions in the array of the 
character just encoded with the preselected character so as to associate the character having the larger 
character count with a position in the array more likely to be associated with a shorter code. 

24. The apparatus of Claim 23, FURTHER CHARACTERIZED IN THAT the preselected character is 
positioned a predetermined number of memory locations in the array toward the beginning of the array from 

45 the particular character. 

25. The apparatus of Claim 24, FURTHER CHARACTERIZED IN THAT the predetermined number of 
memory locations is sixteen. 

26. The apparatus of Claim 21, FURTHER CHARACTERIZED IN THAT the encoding table comprises a ? 
plurality of sets of digital codes (50a, 50o, 50c), each of the sets comprising digital codes of the same 

50 predetermined length within each set but different between sets, with the sets being arranged in an initial 

ordered array in said memory means associated with the characters of the alphabet beginning with a set 5 
having shorter codes (50a) toward the beginning of the array and a set having larger codes (50c) toward the 
end of the array. 

27. The apparatus of Claim 26, FURTHER CHARACTERIZED IN THAT each of the sets of codes (50) 
65 are initially preset at a predetermined size, and said data processor (30) periodically adjusts the sizes of the 

sets, whereby the numbers of characters represented by different digital codes changes as. a function of the 
frequency of occurrence of characters in the data presented for encoding. 
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28. The apparatus of Claim 27, FURTHER CHARACTERIZED IN THAT: 

said data processor (30) is operative for summing the character counts for predetermined groups of 
consecutive characters to obtain a plurality of group counts (80); 

said data processor (30) is operative for comparing the group count (80) for groups towards the end of 
s the array with the character counts (CO for characters towards the beginning of the array; 

said data processor (30) is operative for increasing the size of the set of the shortest digital codes until 
there is a correspondence between a group count and a character count at the end of the set of shortest 
digital codes. 

whereby the size of the set of the shortest digital codes is increased to include more characters having 
w a higher probability of occurrence than any the characters in the set of codes associated with the group 
counts. 

29. The apparatus of Claim 21, FURTHER CHARACTERIZED IN THAT said apparatus is employed in a 
modem. 

30. In a data processing, storage, or communication system wherein data is to be compressed for 
15 processing, storage or communication, including a data processor (30) for manipulating data an input circuit 

(22) for receiving a character to be transmitted or stored, and an output circuit (24) for providing an output, 
a method for compressing an input data string having a string of m repetitive characters, particularly useful 
for transmitting data with a modem (10) over a communications link (12), CHARACTERIZED BY the steps 
of: 

20 (1) entering a "repeat state" (90) by detecting the occurrence of a predetermined number n 

instances of repetitive characters received by said input circuit (22); 

(2) transmitting data with said output circuit (24) representing the predetermined number n repetitive 
characters; and 

(3) immediately following step (2), transmitting with said output circuit (24) a coded first indicium (91) 
25 representing the number o of occurrences of the repetitive characters subsequent to the n instances, where 

n+o = m. 

31. The method of Claim 30, FURTHER CHARACTERIZED IN THAT the predetermined number n is 
three. 

32. A method of decompressing data compressed with the method of Claim 30, CHARACTERIZED BY 
so the steps of: 

(1) detecting a "repeat state" (90) in received data by detecting the occurrence of a predetermined 
number n instances of repetitive characters; 

(2) providing n characters representing the predetermined number n repetitive characters; and 

(3) following the n repetitive characters, receiving the first indicium (91) representing the number o of 
35 occurrences of the repetitive data subsequent to the n instances, where n +o = m; and 

(4) providing o decoded characters subsequent to the n characters. 

33. The method of Claim 30, FURTHER CHARACTERIZED IN THAT the first indicium (91) represents a 
number between zero and a predetermined numberp characters. 

34. The method of Claim 33, FURTHER CHARACTERIZED IN THAT in response to the first indicium 
40 being p, providing a second coded indicium (91d)representing an additional number of occurrences of 

repetitive characters subsequent to the p instances. 

35. The method of Claim 33, FURTHER CHARACTERISED IN THAT p is fifteen. 



45 



50 



55 



27 



0 283 735 



Modem 1 



I 



12 



Encoding 
Table (Tl) 



Modem 2 



Fig. 1 



I 



Decoding 
Table (T2) 



To/from computer 
^ 23 



Computer interface 
modem circuitry 



Control 
Register 



22 

U 



-10, 15 



27 



24 



20 



Telephone interface 
modem circuitry 



Control 
Register 



28 



Telco 
line^ 



;int 

36 



25' 



Data Bus 



32 



Program memory 
31 



Address Bus 



I 



RAM - 256 x 16 (8 
for count; 8 for 
character) 

Table Tl or T2 



35 



Z7 



1 



Timery 
count- 
er 

33 



A0-A15 Int D0-D7 

Z80 microcomputer 
30 



Data Compression Circuit 



Fig. 2 



0 283 735 



70a 70b 70c 70d 



EXEMPLARY DATA STREAM: 



The Hayes Compression algorithm works on the principle of data 
frequency statistics. These statistics are accumulated on the 
fly, eliminating the need to pre-scan the data to determine its 
characteristics. Data that occurs more frequently is represented 
in fewer bits than data that occurs less frequently. 
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EXEMPLARY DATA STREAM: 

The Hayes Compression algorithm works on the principle of data 
* frequency statistics. These statistics are accumulated on the 
fly, eliminating the need to pre-scan the data to determine its 
characteristics. Data that occurs more frequently is represented 
in fewer bits than data that occurs less frequently. 
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EXEMPLARY DATA STREAM: 

The Hayes Compression algorithm works on the principle of data 
frequency statistics. These statistics are accumulated on the 
fly, eliminating the need to pre-scan the data to determine its 
characteristics. Data that occurs more frequently is represented 
in fewer bits than data that occurs less frequently. 
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EXEMPLARY DATA STREAM: 



The Hayes Compression algorithm works on the principle of data 
frequency statistics. These statistics are accumulated on the 
fly, eliminating the need to pre-scan the data to determine its 
characteristics. Data that occurs more frequently is represented 
in fewer bits than data that occurs less frequently. 



Still on initial encoding 
level : a =4 characters 
encoded with p bits ( p =4) 
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Table Tl before end of second line 
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EXEMPLARY DATA STREAM: 



The Hayes Compression algorithm works on the principle of data 
frequency statistics. These statistics are accumulated on the 
fly, eliminating the need to pre-scan the data to determine its 
characteristics. Data that occurs more frequently is represented 
in fewer bits than data that occurs less frequently. 



. 50a Initial 
encoding level 
=4 
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Table Tl before first level recalculation 
(after the "y" in "fly" in line 3): 
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Table Tl after first level recalculation: 



l 
2 



sp 


8 




Ap 




0 




3 


3 


a 




P 






i@ 




vP 




+- 




+0 








+P 




+' 


+P 


e 


6 


A H 






i 




1 




A 




q 




Q 


vA 




>Q 




+! 




+1 




i-A 




+Q 




+a 


+q 


s 


5 


r 3 


A R 








2 




B 




R 




b 


vB 




vR 




+" 




+2 




+B 




+R 




+b 


+r 


t 


5 


c 2 


i *c 




A S 




# 




C 




3 




S 


vC 




vS 




+# 




+3 




+C 




+S 




+c 


+s 


a 


4 




A T 




$ 




T 


1 


d 


1 


4 




D 


vD 




vT 




+$ 




+4 




+D 




+T 




+d 


+t 


i 


4 


A E 


A U 




% 




u 


1 


5 




E 




U 


vE 




vU 




+% 




+5 




+E 




+U 




+e 


+u 


0 


3 


A V 


& 




6 




f 


1 


F 




V 




V 


vF 




/V 




+& 




+6 




+F 




+V 




+f 


+v 


*M 


1 


A W 


t 




7 




G 




g 




w 




W 


vG 




/W 




+' 




+7 




+G 




hW 




+g 


+w 


\T 


1 


h : 


! A X 




( 




H 




8 




X 




X 


vH 




/X 




+( 




+8 




4-H 




+-X 




+h 


+x 


*I 




A Y 


) 




9 




I 




y 


1 


Y 




j 


vl 




vY 




+) 




+9 




+1 




-t-Y 




+i 


+7 


^G 




A B 














J 




Z 




z 


vJ 




vZ 




+* 




+: 




+J 




+Z 




+j 


+z 


K K 




A [ 


+ 




t 




K 




k 




[ 




{ 

1 


vK 




v[ 




4-+ 




+; 




+K 




■K 




+k 


+{ 


K L 




A \ 


» 




1 


i 


< 




L 




\ 




vL 




v\ 




+. 




K 




+L 




+\ 




+1 


+1 


"A 




A l 






m 


i 






M 




] 




} 


vM 




v] 




+- 




+= 




+M 




+] 




+m 


+} 


V N 




n 2 


AA 




V 




> 




N 




A 






vN 




V A 




+. 




+> 




+1S 




+* 




+n 


+~ 


A F 




A 0 




1 


7 




? 




O 








ro 


vO 




v_ 




f/ 




+? 




fO 




f_ 




+0 


+' 


i=0 




1 


2 . . 












11 










* * 


13 


14 


15 



Encoded with q bits : b~ 256 - a -c 
"= 120 characters (q = 8) 



, Encoded with r bits : 
c = 16a = 128 characters 
( r = 12) 



0 283 735 



EXEMPLARY DATA STREAM: 

The Hayes Compression algorithm works on the principle of data 
frequency statistics. These statistics are accumulated on the 
fly, eliminating the need to pre-scan the data to determine its 
characteristics. Data that occurs more frequently is represented 
in fewer bits than data that occurs less frequently. 

FIG. 13 

Table Tl before end of third line: 
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Table Tl after end of third line: 
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EXEMPLARY DATA STREAM: 



The Hayes Compression algorithm works on the principle of data 
frequency statistics. These statistics are accumulated on the 
fly, eliminating the need to pre-scan the data to determine its 
characteristics. Data that occurs more frequently is represented 
in fewer bits than data that occurs less frequently. 



Beginning 
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Table Tl before second level recalculation 
(after the "in" on line 5) : 
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Table Tl after second level recalculation: 
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FIG. 17 

Repetitive character string compression 
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© A method and apparatus for compressing data, 
particularly useful in a communication system which 
employs modems (10, 15). The preferred method is 
implemented in a microprocessor (30) within a 
modem (10, 15), and dynamically adapts to chang- 
ing data statistics. Parallel encoding (T1) and decod- 
ing (T2) tables are provided in a memory (35) at an 
encoding modem (10) and a decoding modem (15), 
and are updated for each character processed. Each 
table (T1, T2) has a plurality of digital compression 
codes associated with the characters of an alphabet. 
In response to an item of data presented for encod- 
ing, a compression code which corresponds to the 
character presented for encoding, is selected using 
the encoding table (T1). The selected compression 
code is provided as an output over a communication 
line (12). Periodically, the association between the 
codes and the characters of the alphabet in the 
tables (T1, T2) is adjusted as a function of the 
frequency of occurrence of characters of the al- 
phabet, over a plurality of characters, as maintained 
by. a character count As the frequency of occurrence 
of characters presented for encoding changes, the 
more frequently occurring characters become asso- 
ciated with the shorter compression codes in the 
encoding table (T1). By performing the same steps 
in parallel at the decoding table (T2) at the decoding 



modem (15), the compression codes coming in over 
the communication line (12) are used at the decoder 
to select a character of the alphabet, which is pro- 
vided as a decoded output. 
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